On the creation of hypertext links in full-text documents: Measurement of retrieval effectiveness

Author(s):  
David Ellis ◽  
Jonathan Furner ◽  
Peter Willett
1994 ◽  
Vol 50 (2) ◽  
pp. 67-98 ◽  
Author(s):  
DAVID ELLIS ◽  
JONATHAN FURNER‐HINES ◽  
PETER WILLETT

Author(s):  
I. P. Komenda

The publication deals with the initial stages of inclusion into the electronic catalogue of bibliographic records of electronic periodicals from eLIBRARY.RU platform and electronic serials which have been subscribed by the Central Science Library of the NAS of Belarus. The activities on addition of full text documents and tables of contents of periodicals into bibliographic records have been considered.


1991 ◽  
Vol 25 (2) ◽  
pp. 119-131 ◽  
Author(s):  
Suliman Al‐Hawamdeh ◽  
Geoff Smith ◽  
Peter Willett
Keyword(s):  

2014 ◽  
Vol 35 (4/5) ◽  
pp. 293-307
Author(s):  
Mark Edward Phillips ◽  
Daniel Gelaw Alemneh ◽  
Brenda Reyes Ayala

Purpose – Increasingly, higher education institutions worldwide are accepting only electronic versions of their students’ theses and dissertations. These electronic theses and dissertations (ETDs) frequently feature embedded URLs in body, footnote and references section of the document. Additionally the web as ETD subject appears to be on an upward trajectory as the web becomes an increasingly important part of everyday life. The paper aims to discuss these issues. Design/methodology/approach – The authors analyzed URL references in 4,335 ETDs in the UNT ETD collection. Links were extracted from the full-text documents, cleaned and canonicalized, deconstructed in the subparts of a URL and then indexed with the full-text indexer Solr. Queries to aggregate and generate overall statistics and trends were generated against the Solr index. The resulting data were analyzed for patterns and trends within a variety of groupings. Findings – ETDs at the University of North Texas that include URL references have increased over the past 14 years from 23 percent in 1999 to 80 percent in 2012. URLs are being included into ETDs in the majority of cases: 62 percent of the publications analyzed in this work contained URLs. Originality/value – This research establishes that web resources are being widely cited in UNT's ETDs and that growth in citing these resources has been observed. Further it provides a preliminary framework for technical methods appropriate for approaching analysis of similar data that may be applicable to other sets of documents or subject areas.


2010 ◽  
Vol 7 (3) ◽  
pp. 400-411 ◽  
Author(s):  
Artemy Kolchinsky ◽  
Alaa Abi-Haidar ◽  
Jasleen Kaur ◽  
Ahmed Abdeen Hamed ◽  
Luis M Rocha

Author(s):  
Bapuji Rao

The chapter is about the clustering of text documents based on the input of the n-number of words on the m-number of text documents using graph mining techniques. The author has proposed an algorithm for clustering of text documents by inputting n-number of words on m-number of text documents. First of all the proposed algorithm starts the selection of documents with extension name “.txt” from m-numbers of documents having various types of extension names. The n-number of words are input on the selected “.txt” documents, the algorithm starts n-clustering of text documents based on an n-input word. This is possible by way of creation of a document-word frequency matrix in the memory. Then the frequency-word table is converted into the un-oriented document-word incidence matrix by replacing all non-zeros with 1s. Using the un-oriented document-word incidence matrix, the algorithm starts the creation of n-number of clusters of text documents having the presence of words ranging from 1 to n respectively. Finally, these n-clusters based on word-wise as well as 1 to n word-wise.


2020 ◽  
Author(s):  
Bernhard Rieder

This chapter investigates early attempts in information retrieval to tackle the full text of document collections. Underpinning a large number of contemporary applications, from search to sentiment analysis, the concepts and techniques pioneered by Hans Peter Luhn, Gerard Salton, Karen Spärck Jones, and others involve particular framings of language, meaning, and knowledge. They also introduce some of the fundamental mathematical formalisms and methods running through information ordering, preparing the extension to digital objects other than text documents. The chapter discusses the considerable technical expressivity that comes out of the sprawling landscape of research and experimentation that characterizes the early decades of information retrieval. This includes the emergence of the conceptual construct and intermediate data structure that is fundamental to most algorithmic information ordering: the feature vector.


Sign in / Sign up

Export Citation Format

Share Document